Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.

Identifieur interne : 000035 ( France/Analysis ); précédent : 000034; suivant : 000036

MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.

Auteurs : Kévin Vervier [États-Unis] ; Pierre Mahé [France] ; Jean-Philippe Vert [France]

Source :

RBID : pubmed:30030800

Descripteurs français

English descriptors

Abstract

Metagenomics is the study of microbial community diversity, especially the uncultured microorganisms by shotgun sequencing environmental samples. As the sequencers throughput and the data volume increase, it becomes challenging to develop scalable bioinformatics tools that reconstruct microbiome structure by binning sequencing reads to reference genomes. Standard alignment-based methods, such as BWA-MEM, provide state-of-the-art performance, but we demonstrate in Vervier et al. (2016) that compositional approaches using nucleotides motifs have faster analysis time, for comparable accuracy. In this work, we describe how to use MetaVW, a scalable machine learning implementation for short sequencing reads binning, based on their k-mers profile. We provide a step-by-step guideline on how we trained the classification models and how it can easily generalize to user-defined reference genomes and specific applications. We also give additional details on what effect parameters in the algorithm have on performances.

DOI: 10.1007/978-1-4939-8561-6_2
PubMed: 30030800


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:30030800

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.</title>
<author>
<name sortKey="Vervier, Kevin" sort="Vervier, Kevin" uniqKey="Vervier K" first="Kévin" last="Vervier">Kévin Vervier</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA</wicri:regionArea>
<placeName>
<region type="state">Iowa</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Mahe, Pierre" sort="Mahe, Pierre" uniqKey="Mahe P" first="Pierre" last="Mahé">Pierre Mahé</name>
<affiliation wicri:level="1">
<nlm:affiliation>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile, France.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile</wicri:regionArea>
<wicri:noRegion>Marcy-l'Étoile</wicri:noRegion>
<wicri:noRegion>Marcy-l'Étoile</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Vert, Jean Philippe" sort="Vert, Jean Philippe" uniqKey="Vert J" first="Jean-Philippe" last="Vert">Jean-Philippe Vert</name>
<affiliation wicri:level="3">
<nlm:affiliation>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau, France. Jean-Philippe.Vert@mines-paristech.fr.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau</wicri:regionArea>
<placeName>
<region type="region">Île-de-France</region>
<region type="old region">Île-de-France</region>
<settlement type="city">Fontainebleau</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:30030800</idno>
<idno type="pmid">30030800</idno>
<idno type="doi">10.1007/978-1-4939-8561-6_2</idno>
<idno type="wicri:Area/PubMed/Corpus">000830</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000830</idno>
<idno type="wicri:Area/PubMed/Curation">000830</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000830</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000848</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000848</idno>
<idno type="wicri:Area/Ncbi/Merge">001F08</idno>
<idno type="wicri:Area/Ncbi/Curation">001F08</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001F08</idno>
<idno type="wicri:Area/Main/Merge">000904</idno>
<idno type="wicri:Area/Main/Curation">000901</idno>
<idno type="wicri:Area/Main/Exploration">000901</idno>
<idno type="wicri:Area/France/Extraction">000035</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.</title>
<author>
<name sortKey="Vervier, Kevin" sort="Vervier, Kevin" uniqKey="Vervier K" first="Kévin" last="Vervier">Kévin Vervier</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA</wicri:regionArea>
<placeName>
<region type="state">Iowa</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Mahe, Pierre" sort="Mahe, Pierre" uniqKey="Mahe P" first="Pierre" last="Mahé">Pierre Mahé</name>
<affiliation wicri:level="1">
<nlm:affiliation>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile, France.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile</wicri:regionArea>
<wicri:noRegion>Marcy-l'Étoile</wicri:noRegion>
<wicri:noRegion>Marcy-l'Étoile</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Vert, Jean Philippe" sort="Vert, Jean Philippe" uniqKey="Vert J" first="Jean-Philippe" last="Vert">Jean-Philippe Vert</name>
<affiliation wicri:level="3">
<nlm:affiliation>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau, France. Jean-Philippe.Vert@mines-paristech.fr.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau</wicri:regionArea>
<placeName>
<region type="region">Île-de-France</region>
<region type="old region">Île-de-France</region>
<settlement type="city">Fontainebleau</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Methods in molecular biology (Clifton, N.J.)</title>
<idno type="eISSN">1940-6029</idno>
<imprint>
<date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Base Sequence</term>
<term>Calibration</term>
<term>Genome, Bacterial</term>
<term>Machine Learning</term>
<term>Metagenomics (methods)</term>
<term>Reproducibility of Results</term>
<term>Sequence Analysis, DNA</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Analyse de séquence d'ADN</term>
<term>Apprentissage machine</term>
<term>Calibrage</term>
<term>Génome bactérien</term>
<term>Logiciel</term>
<term>Métagénomique ()</term>
<term>Reproductibilité des résultats</term>
<term>Séquence nucléotidique</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Metagenomics</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Base Sequence</term>
<term>Calibration</term>
<term>Genome, Bacterial</term>
<term>Machine Learning</term>
<term>Reproducibility of Results</term>
<term>Sequence Analysis, DNA</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Analyse de séquence d'ADN</term>
<term>Apprentissage machine</term>
<term>Calibrage</term>
<term>Génome bactérien</term>
<term>Logiciel</term>
<term>Métagénomique</term>
<term>Reproductibilité des résultats</term>
<term>Séquence nucléotidique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Metagenomics is the study of microbial community diversity, especially the uncultured microorganisms by shotgun sequencing environmental samples. As the sequencers throughput and the data volume increase, it becomes challenging to develop scalable bioinformatics tools that reconstruct microbiome structure by binning sequencing reads to reference genomes. Standard alignment-based methods, such as BWA-MEM, provide state-of-the-art performance, but we demonstrate in Vervier et al. (2016) that compositional approaches using nucleotides motifs have faster analysis time, for comparable accuracy. In this work, we describe how to use MetaVW, a scalable machine learning implementation for short sequencing reads binning, based on their k-mers profile. We provide a step-by-step guideline on how we trained the classification models and how it can easily generalize to user-defined reference genomes and specific applications. We also give additional details on what effect parameters in the algorithm have on performances.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
<li>États-Unis</li>
</country>
<region>
<li>Iowa</li>
<li>Île-de-France</li>
</region>
<settlement>
<li>Fontainebleau</li>
</settlement>
</list>
<tree>
<country name="États-Unis">
<region name="Iowa">
<name sortKey="Vervier, Kevin" sort="Vervier, Kevin" uniqKey="Vervier K" first="Kévin" last="Vervier">Kévin Vervier</name>
</region>
</country>
<country name="France">
<noRegion>
<name sortKey="Mahe, Pierre" sort="Mahe, Pierre" uniqKey="Mahe P" first="Pierre" last="Mahé">Pierre Mahé</name>
</noRegion>
<name sortKey="Vert, Jean Philippe" sort="Vert, Jean Philippe" uniqKey="Vert J" first="Jean-Philippe" last="Vert">Jean-Philippe Vert</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000035 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000035 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    France
   |étape=   Analysis
   |type=    RBID
   |clé=     pubmed:30030800
   |texte=   MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/France/Analysis/RBID.i   -Sk "pubmed:30030800" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/France/Analysis/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021